skip to main content


Search for: All records

Creators/Authors contains: "Conesa, Ana"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species. 
    more » « less
  2. Abstract

    Queuosine (Q) is a conserved hypermodification of the wobble base of tRNA containing GUN anticodons but the physiological consequences of Q deficiency are poorly understood in bacteria. This work combines transcriptomic, proteomic and physiological studies to characterize a Q-deficient Escherichia coli K12 MG1655 mutant. The absence of Q led to an increased resistance to nickel and cobalt, and to an increased sensitivity to cadmium, compared to the wild-type (WT) strain. Transcriptomic analysis of the WT and Q-deficient strains, grown in the presence and absence of nickel, revealed that the nickel transporter genes (nikABCDE) are downregulated in the Q– mutant, even when nickel is not added. This mutant is therefore primed to resist to high nickel levels. Downstream analysis of the transcriptomic data suggested that the absence of Q triggers an atypical oxidative stress response, confirmed by the detection of slightly elevated reactive oxygen species (ROS) levels in the mutant, increased sensitivity to hydrogen peroxide and paraquat, and a subtle growth phenotype in a strain prone to accumulation of ROS.

     
    more » « less
  3. Abstract

    For long-duration space missions, it is critical to maintain health-associated homeostasis between astronauts and their microbiome. To achieve this goal it is important to more fully understand the host–symbiont relationship under the physiological stress conditions of spaceflight. To address this issue we examined the impact of a spaceflight analog, low-shear-modeled microgravity (LSMMG), on the transcriptome of the mutualistic bacteriumVibrio fischeri. Cultures ofV. fischeriand a mutant defective in the global regulator Hfq (∆hfq) were exposed to either LSMMG or gravity conditions for 12 h (exponential growth) and 24 h (stationary phase growth). Comparative transcriptomic analysis revealed few to no significant differentially expressed genes between gravity and the LSMMG conditions in the wild type or mutantV. fischeriat exponential or stationary phase. There was, however, a pronounced change in transcriptomic profiles during the transition between exponential and stationary phase growth in bothV. fischericultures including an overall decrease in gene expression associated with translational activity and an increase in stress response. There were also several upregulated stress genes specific to the LSMMG condition during the transition to stationary phase growth. The ∆hfqmutants exhibited a distinctive transcriptome profile with a significant increase in transcripts associated with flagellar synthesis and transcriptional regulators under LSMMG conditions compared to gravity controls. These results indicate the loss of Hfq significantly influences gene expression under LSMMG conditions in a bacterial symbiont. Together, these results improve our understanding of the mechanisms by which microgravity alters the physiology of beneficial host-associated microbes.

     
    more » « less
  4. Abstract Motivation

    Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models.

    Results

    We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems.

    Availability and implementation

    The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract

    Over the last 25 years, biology has entered the genomic era and is becoming a science of ‘big data’. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3–4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.

     
    more » « less